Applying the Expresso Algorithm to Large Parsed Corpora

نویسندگان

  • Gosse Bouma
  • John Nerbonne
چکیده

Information extraction (IE) culls information from text including relations, our focus here, such as head-of(Sergej-Brin, Google). The Espresso algorithm was developed to do this (Pantel & Pennacchiotti 2006), and we extend their work here first by using as input not raw text but rather syntactic analyses derived from the text, and second by applying the algorithm to Dutch. This required parsing hundreds of millions of words of text, which was regarded as infeasible only ten years ago.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Applying the Espresso-algorithm to large parsed corpora

Information extraction systems learn patterns for extracting pairs instantiating a given relation from text. For instance, for the relation capital of a system might learn extraction patterns such as ‘Arg1 is capital of Arg2’, or ’The embassador of arg2 was called back to Arg1’. Lightly supervised information extraction systems learn extraction patterns by means of a bootstrapping procedure, wh...

متن کامل

Parsed Corpora for Linguistics

Knowledge-based parsers are now accurate, fast and robust enough to be used to obtain syntactic annotations for very large corpora fully automatically. We argue that such parsed corpora are an interesting new resource for linguists. The argument is illustrated by means of a number of recent results which were established with the help of parsed corpora.

متن کامل

Using Parsed Corpora for Estimating Stochastic Inversion Transduction Grammars

An important problem when using Stochastic Inversion Transduction Grammars is their computational cost. More specifically, when dealing with corpora such as Europarl only one iteration of the estimation algorithm becomes prohibitive. In this work, we apply a reduction of the cost by taking profit of the bracketing information in parsed corpora and show machine translation results obtained with ...

متن کامل

A Synchronous Context Free Grammar for Time Normalization

We present an approach to time normalization (e.g. the day before yesterday⇒2013-04-12) based on a synchronous context free grammar. Synchronous rules map the source language to formally defined operators for manipulating times (FindEnclosed, StartAtEndOf, etc.). Time expressions are then parsed using an extended CYK+ algorithm, and converted to a normalized form by applying the operators recur...

متن کامل

Applying automatically parsed corpora to the study of language variation

In this work, we discuss the benefits of using automatically parsed corpora to study language variation. The study of language variation is an area of linguistics in which quantitative methods have been particularly successful. We argue that the large datasets that can be obtained using automatic annotation can help drive further research in this direction, providing sufficient data for the inc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014